Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
1.
medRxiv ; 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38562836

RESUMO

Objectives: To synthesize discussions among sexual minority men and gender diverse (SMMGD) individuals on mpox, given limited representation of SMMGD voices in existing mpox literature. Methods: BERTopic (a topic modeling technique) was employed with human validations to analyze mpox-related tweets (n = 8,688; October 2020-September 2022) from 2,326 self-identified SMMGD individuals in the U.S.; followed by content analysis and geographic analysis. Results: BERTopic identified 11 topics: health activism (29.81%); mpox vaccination (25.81%) and adverse events (0.98%); sarcasm, jokes, emotional expressions (14.04%); COVID-19 and mpox (7.32%); government/public health response (6.12%); mpox symptoms (2.74%); case reports (2.21%); puns on the virus' naming (i.e., monkeypox; 0.86%); media publicity (0.68%); mpox in children (0.67%). Mpox health activism negatively correlated with LGB social climate index at U.S. state level, ρ = -0.322, p = 0.031. Conclusions: SMMGD discussions on mpox encompassed utilitarian (e.g., vaccine access, case reports, mpox symptoms) and emotionally-charged themes-advocating against homophobia, misinformation, and stigma. Mpox health activism was more prevalent in states with lower LGB social acceptance. Public Health Implications: Findings illuminate SMMGD engagement with mpox discourse, underscoring the need for more inclusive health communication strategies in infectious disease outbreaks to control associated stigma.

2.
J Biomed Inform ; 151: 104618, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38431151

RESUMO

OBJECTIVE: Goals of care (GOC) discussions are an increasingly used quality metric in serious illness care and research. Wide variation in documentation practices within the Electronic Health Record (EHR) presents challenges for reliable measurement of GOC discussions. Novel natural language processing approaches are needed to capture GOC discussions documented in real-world samples of seriously ill hospitalized patients' EHR notes, a corpus with a very low event prevalence. METHODS: To automatically detect sentences documenting GOC discussions outside of dedicated GOC note types, we proposed an ensemble of classifiers aggregating the predictions of rule-based, feature-based, and three transformers-based classifiers. We trained our classifier on 600 manually annotated EHR notes among patients with serious illnesses. Our corpus exhibited an extremely imbalanced ratio between sentences discussing GOC and sentences that do not. This ratio challenges standard supervision methods to train a classifier. Therefore, we trained our classifier with active learning. RESULTS: Using active learning, we reduced the annotation cost to fine-tune our ensemble by 70% while improving its performance in our test set of 176 EHR notes, with 0.557 F1-score for sentence classification and 0.629 for note classification. CONCLUSION: When classifying notes, with a true positive rate of 72% (13/18) and false positive rate of 8% (13/158), our performance may be sufficient for deploying our classifier in the EHR to facilitate bedside clinicians' access to GOC conversations documented outside of dedicated notes types, without overburdening clinicians with false positives. Improvements are needed before using it to enrich trial populations or as an outcome measure.


Assuntos
Comunicação , Documentação , Humanos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Planejamento de Assistência ao Paciente
3.
J Med Internet Res ; 26: e47923, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38488839

RESUMO

BACKGROUND: Patient health data collected from a variety of nontraditional resources, commonly referred to as real-world data, can be a key information source for health and social science research. Social media platforms, such as Twitter (Twitter, Inc), offer vast amounts of real-world data. An important aspect of incorporating social media data in scientific research is identifying the demographic characteristics of the users who posted those data. Age and gender are considered key demographics for assessing the representativeness of the sample and enable researchers to study subgroups and disparities effectively. However, deciphering the age and gender of social media users poses challenges. OBJECTIVE: This scoping review aims to summarize the existing literature on the prediction of the age and gender of Twitter users and provide an overview of the methods used. METHODS: We searched 15 electronic databases and carried out reference checking to identify relevant studies that met our inclusion criteria: studies that predicted the age or gender of Twitter users using computational methods. The screening process was performed independently by 2 researchers to ensure the accuracy and reliability of the included studies. RESULTS: Of the initial 684 studies retrieved, 74 (10.8%) studies met our inclusion criteria. Among these 74 studies, 42 (57%) focused on predicting gender, 8 (11%) focused on predicting age, and 24 (32%) predicted a combination of both age and gender. Gender prediction was predominantly approached as a binary classification task, with the reported performance of the methods ranging from 0.58 to 0.96 F1-score or 0.51 to 0.97 accuracy. Age prediction approaches varied in terms of classification groups, with a higher range of reported performance, ranging from 0.31 to 0.94 F1-score or 0.43 to 0.86 accuracy. The heterogeneous nature of the studies and the reporting of dissimilar performance metrics made it challenging to quantitatively synthesize results and draw definitive conclusions. CONCLUSIONS: Our review found that although automated methods for predicting the age and gender of Twitter users have evolved to incorporate techniques such as deep neural networks, a significant proportion of the attempts rely on traditional machine learning methods, suggesting that there is potential to improve the performance of these tasks by using more advanced methods. Gender prediction has generally achieved a higher reported performance than age prediction. However, the lack of standardized reporting of performance metrics or standard annotated corpora to evaluate the methods used hinders any meaningful comparison of the approaches. Potential biases stemming from the collection and labeling of data used in the studies was identified as a problem, emphasizing the need for careful consideration and mitigation of biases in future studies. This scoping review provides valuable insights into the methods used for predicting the age and gender of Twitter users, along with the challenges and considerations associated with these methods.


Assuntos
Mídias Sociais , Humanos , Adulto Jovem , Adulto , Reprodutibilidade dos Testes , Redes Neurais de Computação , Aprendizado de Máquina
4.
J Med Internet Res ; 26: e50652, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38526542

RESUMO

We manually annotated 9734 tweets that were posted by users who reported their pregnancy on Twitter, and used them to train, evaluate, and deploy deep neural network classifiers (F1-score=0.93) to detect tweets that report having a child with attention-deficit/hyperactivity disorder (678 users), autism spectrum disorders (1744 users), delayed speech (902 users), or asthma (1255 users), demonstrating the potential of Twitter as a complementary resource for assessing associations between pregnancy exposures and childhood health outcomes on a large scale.


Assuntos
Asma , Transtorno do Espectro Autista , Mídias Sociais , Criança , Feminino , Gravidez , Humanos , Asma/epidemiologia , Redes Neurais de Computação
6.
J Am Med Inform Assoc ; 31(4): 991-996, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38218723

RESUMO

OBJECTIVE: The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. In this paper, we present the annotated corpora, a technical summary of participants' systems, and the performance results. METHODS: The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of 5 tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). RESULTS: In total, 29 teams registered, representing 17 countries. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. CONCLUSION: To facilitate future work, the datasets-a total of 61 353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.


Assuntos
Mídias Sociais , Humanos , Mineração de Dados/métodos , Redes Neurais de Computação , Processamento de Linguagem Natural , Aprendizado de Máquina
7.
Eur Heart J ; 45(5): 332-345, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38170821

RESUMO

Natural language processing techniques are having an increasing impact on clinical care from patient, clinician, administrator, and research perspective. Among others are automated generation of clinical notes and discharge letters, medical term coding for billing, medical chatbots both for patients and clinicians, data enrichment in the identification of disease symptoms or diagnosis, cohort selection for clinical trial, and auditing purposes. In the review, an overview of the history in natural language processing techniques developed with brief technical background is presented. Subsequently, the review will discuss implementation strategies of natural language processing tools, thereby specifically focusing on large language models, and conclude with future opportunities in the application of such techniques in the field of cardiology.


Assuntos
Inteligência Artificial , Cardiologia , Humanos , Processamento de Linguagem Natural , Alta do Paciente
9.
medRxiv ; 2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-37904943

RESUMO

Background: Phenotypes identified during dysmorphology physical examinations are critical to genetic diagnosis and nearly universally documented as free-text in the electronic health record (EHR). Variation in how phenotypes are recorded in free-text makes large-scale computational analysis extremely challenging. Existing natural language processing (NLP) approaches to address phenotype extraction are trained largely on the biomedical literature or on case vignettes rather than actual EHR data. Methods: We implemented a tailored system at the Children's Hospital of Philadelpia that allows clinicians to document dysmorphology physical exam findings. From the underlying data, we manually annotated a corpus of 3136 organ system observations using the Human Phenotype Ontology (HPO). We provide this corpus publicly. We trained a transformer based NLP system to identify HPO terms from exam observations. The pipeline includes an extractor, which identifies tokens in the sentence expected to contain an HPO term, and a normalizer, which uses those tokens together with the original observation to determine the specific term mentioned. Findings: We find that our labeler and normalizer NLP pipeline, which we call PhenoID, achieves state-of-the-art performance for the dysmorphology physical exam phenotype extraction task. PhenoID's performance on the test set was 0.717, compared to the nearest baseline system (Pheno-Tagger) performance of 0.633. An analysis of our system's normalization errors shows possible imperfections in the HPO terminology itself but also reveals a lack of semantic understanding by our transformer models. Interpretation: Transformers-based NLP models are a promising approach to genetic phenotype extraction and, with recent development of larger pre-trained causal language models, may improve semantic understanding in the future. We believe our results also have direct applicability to more general extraction of medical signs and symptoms. Funding: US National Institutes of Health.

10.
Drug Saf ; 47(1): 81-91, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37995049

RESUMO

INTRODUCTION: Hypertension is the leading cause of heart disease in the world, and discontinuation or nonadherence of antihypertensive medication constitutes a significant global health concern. Patients with hypertension have high rates of medication nonadherence. Studies of reasons for nonadherence using traditional surveys are limited, can be expensive, and suffer from response, white-coat, and recall biases. Mining relevant posts by patients on social media is inexpensive and less impacted by the pressures and biases of formal surveys, which may provide direct insights into factors that lead to non-compliance with antihypertensive medication. METHODS: This study examined medication ratings posted to WebMD, an online health forum that allows patients to post medication reviews. We used a previously developed natural language processing classifier to extract indications and reasons for changes in angiotensin receptor II blocker (ARB) and angiotensin-converting enzyme inhibitor (ACEI) treatments. After extraction, ratings were manually annotated and compared with data from the US Food and Drug administration (FDA) Adverse Events Reporting System (FAERS) public database. RESULTS: From a collection of 343,459 WebMD reviews, we automatically extracted 1867 posts mentioning changes in ACEIs or ARBs, and manually reviewed the 300 most recent posts regarding ACEI treatments and the 300 most recent posts regarding ARB treatments. After excluding posts that only mentioned a dose change or were a false-positive mention, 142 posts in the ARBs dataset and 187 posts in the ACEIs dataset remained. The majority of posts (97% ARBs, 91% ACEIs) indicated experiencing an adverse event as the reason for medication change. The most common adverse events reported mapped to the Medical Dictionary for Regulatory Activities were "musculoskeletal and connective tissue disorders" like muscle and joint pain for ARBs, and "respiratory, thoracic, and mediastinal disorders" like cough and shortness of breath for ACEIs. These categories also had the largest differences in percentage points, appearing more frequently on WebMD data than FDA data (p < 0.001). CONCLUSION: Musculoskeletal and respiratory symptoms were the most commonly reported adverse effects in social media postings associated with drug discontinuation. Managing such symptoms is a potential target of interventions seeking to improve medication persistence.


Assuntos
Hipertensão , Mídias Sociais , Humanos , Anti-Hipertensivos/efeitos adversos , Inibidores da Enzima Conversora de Angiotensina/efeitos adversos , Antagonistas de Receptores de Angiotensina/uso terapêutico , Hipertensão/tratamento farmacológico , Medidas de Resultados Relatados pelo Paciente
11.
medRxiv ; 2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-38045356

RESUMO

Background: Preterm birth, defined as birth at <37 weeks of gestation, is the leading cause of neonatal death globally and, together with low birthweight, the second leading cause of infant mortality in the United States. There is mounting evidence that COVID-19 infection during pregnancy is associated with an increased risk of preterm birth; however, data remain limited by trimester of infection. The ability to study COVID-19 infection during the earlier stages of pregnancy has been limited by available sources of data. The objective of this study was to use self-reports in large-scale, longitudinal social media data to assess the association between trimester of COVID-19 infection and preterm birth. Methods: In this retrospective cohort study, we used natural language processing and machine learning, followed by manual validation, to identify pregnant Twitter users and to search their longitudinal collection of publicly available tweets for reports of COVID-19 infection during pregnancy and, subsequently, a preterm birth or term birth (i.e., a gestational age ≥37 weeks) outcome. Among the users who reported their pregnancy on Twitter, we also identified a 1:1 age-matched control group, consisting of users with a due date prior to January 1, 2020-that is, without COVID-19 infection during pregnancy. We calculated the odds ratios (ORs) with 95% confidence intervals (CIs) to compare the overall rates of preterm birth for pregnancies with and without COVID-19 infection and by timing of infection: first trimester (weeks 1-13), second trimester (weeks 1427), or third trimester (weeks 28-36). Results: Through August 2022, we identified 298 Twitter users who reported COVID-19 infection during pregnancy, a preterm birth or term birth outcome, and maternal age: 94 (31.5%) with first-trimester infection, 110 (36.9%) second-trimester infection, and 95 (31.9%) third-trimester infection. In total, 26 (8.8%) of these 298 users reported preterm birth: 8 (8.5%) were infected during the first trimester, 7 (6.4%) were infected during the second trimester, and 12 (12.6%) were infected during the third trimester. In the 1:1 age-matched control group, 13 (4.4%) of the 298 users reported preterm birth. Overall, the risk of preterm birth was significantly higher for pregnancies with COVID-19 infection compared to those without (OR 2.1, 95% CI 1.06-4.16). In particular, the risk of preterm birth was significantly higher for pregnancies with COVID-19 infection during the third trimester (OR 3.17, CI 1.39-7.21). Conclusion: The results of our study suggest that COVID-19 infection particularly during the third trimester is associated with an increased risk of preterm birth.

12.
medRxiv ; 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37986776

RESUMO

The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of five tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). In total, 29 teams registered, representing 18 countries. In this paper, we present the annotated corpora, a technical summary of the systems, and the performance results. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. To facilitate future work, the datasets-a total of 61,353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.

13.
medRxiv ; 2023 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-37577535

RESUMO

There are many studies that require researchers to extract specific information from the published literature, such as details about sequence records or about a randomized control trial. While manual extraction is cost efficient for small studies, larger studies such as systematic reviews are much more costly and time-consuming. To avoid exhaustive manual searches and extraction, and their related cost and effort, natural language processing (NLP) methods can be tailored for the more subtle extraction and decision tasks that typically only humans have performed. The need for such studies that use the published literature as a data source became even more evident as the COVID-19 pandemic raged through the world and millions of sequenced samples were deposited in public repositories such as GISAID and GenBank, promising large genomic epidemiology studies, but more often than not lacked many important details that prevented large-scale studies. Thus, granular geographic location or the most basic patient-relevant data such as demographic information, or clinical outcomes were not noted in the sequence record. However, some of these data was indeed published, but in the text, tables, or supplementary material of a corresponding published article. We present here methods to identify relevant journal articles that report having produced and made available in GenBank or GISAID, new SARS-CoV-2 sequences, as those that initially produced and made available the sequences are the most likely articles to include the high-level details about the patients from whom the sequences were obtained. Human annotators validated the approach, creating a gold standard set for training and validation of a machine learning classifier. Identifying these articles is a crucial step to enable future automated informatics pipelines that will apply Machine Learning and Natural Language Processing to identify patient characteristics such as co-morbidities, outcomes, age, gender, and race, enriching SARS-CoV-2 sequence databases with actionable information for defining large genomic epidemiology studies. Thus, enriched patient metadata can enable secondary data analysis, at scale, to uncover associations between the viral genome (including variants of concern and their sublineages), transmission risk, and health outcomes. However, for such enrichment to happen, the right papers need to be found and very detailed data needs to be extracted from them. Further, finding the very specific articles needed for inclusion is a task that also facilitates scoping and systematic reviews, greatly reducing the time needed for full-text analysis and extraction.

14.
JMIR Res Protoc ; 12: e47068, 2023 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-37531158

RESUMO

BACKGROUND: Adverse drug events (ADEs) are a considerable public health burden resulting in disability, hospitalization, and death. Even those ADEs deemed nonserious can severely impact a patient's quality of life and adherence to intervention. Monitoring medication safety, however, is challenging. Social media may be a useful adjunct for obtaining real-world data on ADEs. While many studies have been undertaken to detect adverse events on social media, a consensus has not yet been reached as to the value of social media in pharmacovigilance or its role in pharmacovigilance in relation to more traditional data sources. OBJECTIVE: The aim of the study is to evaluate and characterize the use of social media in ADE detection and pharmacovigilance as compared to other data sources. METHODS: A scoping review will be undertaken. We will search 11 bibliographical databases as well as Google Scholar, hand-searching, and forward and backward citation searching. Records will be screened in Covidence by 2 independent reviewers at both title and abstract stage as well as full text. Studies will be included if they used any type of social media (such as Twitter or patient forums) to detect any type of adverse event associated with any type of medication and then compared the results from social media to any other data source (such as spontaneous reporting systems or clinical literature). Data will be extracted using a data extraction sheet piloted by the authors. Important data on the types of methods used (such as machine learning), any limitations of the methods used, types of adverse events and drugs searched for and included, availability of data and code, details of the comparison data source, and the results and conclusions will be extracted. RESULTS: We will present descriptive summary statistics as well as identify any patterns in the types and timing of ADEs detected, including but not limited to the similarities and differences in what is reported, gaps in the evidence, and the methods used to extract ADEs from social media data. We will also summarize how the data from social media compares to conventional data sources. The literature will be organized by the data source for comparison. Where possible, we will analyze the impact of the types of adverse events, the social media platform used, and the methods used. CONCLUSIONS: This scoping review will provide a valuable summary of a large body of research and important information for pharmacovigilance as well as suggest future directions of further research in this area. Through the comparisons with other data sources, we will be able to conclude the added value of social media in monitoring adverse events of medications, in terms of type of adverse events and timing. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/47068.

16.
BioData Min ; 16(1): 20, 2023 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-37443040

RESUMO

The introduction of large language models (LLMs) that allow iterative "chat" in late 2022 is a paradigm shift that enables generation of text often indistinguishable from that written by humans. LLM-based chatbots have immense potential to improve academic work efficiency, but the ethical implications of their fair use and inherent bias must be considered. In this editorial, we discuss this technology from the academic's perspective with regard to its limitations and utility for academic writing, education, and programming. We end with our stance with regard to using LLMs and chatbots in academia, which is summarized as (1) we must find ways to effectively use them, (2) their use does not constitute plagiarism (although they may produce plagiarized text), (3) we must quantify their bias, (4) users must be cautious of their poor accuracy, and (5) the future is bright for their application to research and as an academic tool.

17.
JAMA Netw Open ; 6(7): e2323746, 2023 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-37459097

RESUMO

Importance: Selective serotonin reuptake inhibitors (SSRIs) are a commonly prescribed medication class to treat a variety of mental disorders. However, adherence to SSRIs is low, and uncovering the reasons for discontinuation among SSRI users is an important first step to improving medication persistence. Objective: To identify the reasons SSRIs are discontinued or changed, as reported by patients and caregivers in online drug reviews. Design, Setting, and Participants: This qualitative study used natural language processing and machine learning to extract mentions of changes in SSRI intake from 667 drug reviews posted on the online health forum WebMD from September 1, 2007, to August 31, 2021. The type of medication change, including discontinuation, switch to another medication, or dose change and the reason for the change were manually annotated. In each instance in which an adverse event was reported, the event was categorized using Medical Dictionary for Regulatory Activities primary system organ class (SOC) codes, and its relative frequency was compared with that in spontaneous reporting systems maintained by the US Food and Drug Administration and the UK Medicines and Healthcare Products Regulatory Agency. Main Outcomes and Measures: Reasons for SSRI medication change as assessed using SOC codes. Results: In total, 667 reviews posted by 659 patients or caregivers (516 [78%] of patients were female; 410 [62%] 25-54 years of age) were identified that indicated a medication change: 335 posts indicated SSRI discontinuation, 188 posts indicated dose change, and 179 posts indicated switched medications. Most authors 625 (95%) were patients. The most common reason for medication discontinuation or switching was adverse events experienced, and the most common reason for dose change was titration. Both uptitration and downtitration were initiated by either a health care professional or patient. The most common adverse events were classified by SOC codes as psychiatric disorders, including insomnia, loss of libido, and anxiety. Compared with those in regulatory data, psychiatric adverse events, adverse events recorded by investigations (mostly weight gain) and adverse events associated with the reproductive system (mostly erectile dysfunction) were reported disproportionately more often. Conclusions and Relevance: This qualitative study of online drug reviews found that useful information was provided directly by patients or their caregivers regarding their medication behavior, specifically, information regarding SSRI treatment changes that may inform interventions to improve adherence. These findings suggest that these reported adverse events may be associated with SSRI persistence and that people may feel more inclined to report such events on social media than to clinicians or regulatory agencies.


Assuntos
Transtornos Mentais , Inibidores Seletivos de Recaptação de Serotonina , Estados Unidos , Masculino , Humanos , Feminino , Inibidores Seletivos de Recaptação de Serotonina/efeitos adversos , Preparações Farmacêuticas , Transtornos Mentais/tratamento farmacológico , Ansiedade
18.
medRxiv ; 2023 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-37503241

RESUMO

Background: Since the onset of the COVID-19 pandemic, there has been an unprecedented effort in genomic epidemiology to sequence the SARS-CoV-2 virus and examine its molecular evolution. This has been facilitated by the availability of publicly accessible databases, GISAID and GenBank, which collectively hold millions of SARS-CoV-2 sequence records. However, genomic epidemiology seeks to go beyond phylogenetic analysis by linking genetic information to patient demographics and disease outcomes, enabling a comprehensive understanding of transmission dynamics and disease impact.While these repositories include some patient-related information, such as the location of the infected host, the granularity of this data and the inclusion of demographic and clinical details are inconsistent. Additionally, the extent to which patient-related metadata is reported in published sequencing studies remains largely unexplored. Therefore, it is essential to assess the extent and quality of patient-related metadata reported in SARS-CoV-2 sequencing studies.Moreover, there is limited linkage between published articles and sequence repositories, hindering the identification of relevant studies. Traditional search strategies based on keywords may miss relevant articles. To overcome these challenges, this study proposes the use of an automated classifier to identify relevant articles. Objective: This study aims to conduct a systematic and comprehensive scoping review, along with a bibliometric analysis, to assess the reporting of patient-related metadata in SARS-CoV-2 sequencing studies. Methods: The NIH's LitCovid collection will be used for the machine learning classification, while an independent search will be conducted in PubMed. Data extraction will be conducted using Covidence, and the extracted data will be synthesized and summarized to quantify the availability of patient metadata in the published literature of SARS-CoV-2 sequencing studies. For the bibliometric analysis, relevant data points, such as author affiliations, journal information, and citation metrics, will be extracted. Results: The study will report findings on the extent and types of patient-related metadata reported in genomic viral sequencing studies of SARS-CoV-2. The scoping review will identify gaps in the reporting of patient metadata and make recommendations for improving the quality and consistency of reporting in this area. The bibliometric analysis will uncover trends and patterns in the reporting of patient-related metadata, such as differences in reporting based on study types or geographic regions. Co-occurrence networks of author keywords will also be presented to highlight frequent themes and their associations with patient metadata reporting. Conclusion: This study will contribute to advancing knowledge in the field of genomic epidemiology by providing a comprehensive overview of the reporting of patient-related metadata in SARS-CoV-2 sequencing studies. The insights gained from this study may help improve the quality and consistency of reporting patient metadata, enhancing the utility of sequence metadata and facilitating future research on infectious diseases. The findings may also inform the development of machine learning methods to automatically extract patient-related information from sequencing studies.

19.
Database (Oxford) ; 20232023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36734300

RESUMO

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...